A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction

نویسندگان

  • Qi Li
  • Haijun Zhai
  • Louise Deléger
  • Todd Lingren
  • Megan Kaiser
  • Laura Stoutenborough
  • Imre Solti
چکیده

OBJECTIVE The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication-attribute linkage detection in two clinical corpora. DATA AND METHODS We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard. RESULTS The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora. DISCUSSION AND CONCLUSIONS We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Blindness in Randomized Controlled Trials

In combination with randomization, blinding or masking is an important factor in randomized controlled trials (RCTs), particularly in trials that assess therapeutic effects. Here an attempt is made to explain blindness and why it is important. In clinical trials, blinding is defined as the condition imposed on a study in which study participants, health care providers and assessors collecting o...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Bidirectional RNN for Medical Event Detection in Electronic Health Records

Sequence labeling for extraction of medical events and their attributes from unstructured text in Electronic Health Record (EHR) notes is a key step towards semantic understanding of EHRs. It has important applications in health informatics including pharmacovigilance and drug surveillance. The state of the art supervised machine learning models in this domain are based on Conditional Random Fi...

متن کامل

UTA DLNLP at SemEval-2016 Task 12: Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports

We propose a deep neural network based natural language processing system for clinical information (such as time information, event spans, and their attributes) extraction from raw clinical notes and pathology reports. Our approach uses the context words and their partof-speech tags and shape information as features. We utilize the temporal (1D) convolution neural network to learn the hidden fe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2013